A large accessory genome and high recombination rates may influence global distribution and broad host range of the fungal plant pathogen Claviceps purpurea

Pangenome analyses are increasingly being utilized to study the evolution of eukaryotic organisms. While pangenomes can provide insight into polymorphic gene content, inferences about the ecological and adaptive potential of such organisms also need to be accompanied by additional supportive genomic analyses. In this study we constructed a pangenome of Claviceps purpurea from 24 genomes and examined the positive selection and recombination landscape of an economically important fungal organism for pharmacology and agricultural research. Together, these analyses revealed that C. purpurea has a relatively large accessory genome (~ 38%), high recombination rates (ρ = 0.044), and transposon mediated gene duplication. However, due to observations of relatively low transposable element (TE) content (8.8%) and a lack of variability in genome sizes, prolific TE expansion may be controlled by frequent recombination. We additionally identified that within the ergoline biosynthetic cluster the lpsA1 and lpsA2 were the result of a recombination event. However, the high recombination rates observed in C. purpurea may be influencing an overall trend of purifying selection across the genome. These results showcase the use of selection and recombination landscapes to identify mechanisms contributing to pangenome structure and primary factors influencing the evolution of an organism.


Introduction
Pangenomes can provide useful insight into a species distribution and lifestyle through examination of gene functional diversity, abundance, and distribution into core and accessory genomes. These variations often provide fitness advantages and promote adaptive evolution of the organism [1][2][3]. In prokaryotes the existence of more open pangenomes (large accessory) has been suggested to be the result of adaptive evolution that allows organisms, with large effective population sizes, to migrate into new ecological niches [4]. Whereas closed pangenomes (larger core) are found to be associated with more obligate and specialized organisms [4]. Similar results have been identified in fungal species, where a range of saprotrophic to opportunistic yeasts were found to have accessory genomes representing~9-19% of the genes [5], while Zymoseptoria tritici, a global wheat pathogen, has 40% of genes in the accessory genome [6]. This increase in the Z. tritici accessory genome reflects the global distribution of this pathogen that must continuously adapt to overcome new host resistances and multiple cycles of annual fungicide applications [6,7]. While the identification of pangenome sizes provide valuable knowledge of polymorphic gene content, which can be used to infer the lifestyle of the species [4], a combination of pangenomic and alternative genomic analyses provide a deeper understanding of the primary factors that are contributing to pangenome structure and the adaptive trajectory of the organism. Claviceps purpurea is a biotrophic ascomycete plant pathogen that has a specialized ovarian-specific non-systemic lifestyle with its grass hosts [8]. Despite the specialized infection pattern, C. purpurea has a broad host range of~400 grass species across 8 grass tribes, including economically important cereal crops such as wheat, barley, and rye and has a global distribution [8]. However, the mechanisms that underlie the evolutionary success of this species is still understudied. Unlike other pathogens of cereal crops, researchers have been unsuccessful in identifying qualitative resistance genes in crop or wild grass varieties [9][10][11]. Menzies et al. [9] noted the potential for a complex virulence and host susceptibility relationship of C. purpurea on durum and hexaploid wheat varieties, however, virulence was determined if sclerotia weighed > 81 mg; indicating that C. purpurea is able to initiate its biotrophic interaction but might be arrested during the final stages of sclerotia development. During infection the fungus does not induce necrosis or hypersensitive response (host mediated cell death) in its host, instead it actively manages to maintain host cell viability to obtain nutrients from living tissue through a complex cross-talk of fungal cytokinin production [12][13][14][15][16]. Furthermore, Wyka et al. [17] revealed evidence of tandem gene duplication occurring in genes often associated with pathogenicity or evasion of host defenses (effectors), which may provide insight into the success of the species. However, the factors influencing these duplication events remain unclear.
Claviceps purpurea is also known for its diverse secondary metabolite profile of ergot alkaloids and pigments [18][19][20][21]. Fungal secondary metabolites can play important roles in planthost interactions as virulence factors but can also increase the fitness of the fungus through stress tolerance [8,22,23]. It was recently postulated that the evolution of C. purpurea was associated with a host jump and subsequent adaptation and diversification to cooler, more open habitats [8,17]. In addition, likely due to the toxicity of ergot alkaloids, grass grazing mammals showed avoidance in grazing grass infected with C. purpurea, suggesting a potential for beneficial effects for the host plant [24]. This along with other evidence of neutral to positive effects of infection to host plants [25,26] suggest that C. purpurea is a conditional defensive mutualist [24].
In this study, we implement a comprehensive population genomic analysis to gain a deeper understanding of factors governing the evolution and adaptive potential of C. purpurea. Using 24 isolates, from six countries and three continents, we constructed the pangenome and subsequently identified genes with signatures of positive selection. Full genome alignments were further utilized to estimate population recombination rates and predict recombination hotspots. We observed a large accessory genome which may be influenced by a large effective population size and high recombination rates, which subsequently influence an overall trend of purifying selection and likely help defend against TE expansion. In addition, we observed that the lpsA1 and lpsA2 genes of the well-known ergoline biosynthetic cluster were likely the result of a recombination event.

Pangenome analysis
The pangenome was constructed using OrthoFinder v2.3.3 [44], on all genes identified from the 24 genomes, to infer groups of orthologous gene clusters (orthogroups). OrthoFinder was run using BLASTp on default settings. For downstream analysis, gene clusters were classified as secreted, predicted effectors, transmembrane, secondary (2˚) metabolites, carbohydratedegrading enzymes (CAZys), proteases (MEROPs), and conserved domain (conserved) clusters if � 50% of the strains present in a gene cluster had at least one protein classified as such. Gene clusters not grouped into any of the above categories were categorized as unclassified.
Core and pangenome size curves were extrapolated from resampling of 24 random possible combinations for each pangenome size of 1-24 genomes and modelled by fitting the power law regression formula: y = Ax B + C using the curve_fit function in the Python module Scipy v1.4.1. These processes were automated through the creation of a custom python script (https://github.com/PlantDr430/FunFinder_Pangenome).

Positive selection
To investigate the positive selection landscape of genes we utilized the 53 isolates (22 species) of the Claviceps genus [17, 20; NCBI BioProject: PRJNA528707] and found single-copy orthologs using OrthoFinder v2.3.3 with BLASTp on default settings. A total of 3,628 single-copy orthologs were identified (See Table 2 for detailed report). For each ortholog cluster sequences were aligned using MUSCLE v3.8.1551 [45] on default settings and values of dN, dS, and dN/dS (omega, ω) were estimated using the YN00 [46] method in PAML v4.8 using default parameters.
For statistical purposes, each gene cluster was only characterized by one functional category in the order displayed in Table 2 (i.e. secreted genes are those not already classified as effectors, etc) (See Methods section Statistical analyses and plotting).

Genome alignment, SNP calling, and recombination
Procedures followed [47], for creation of a fine-scale recombination map of fungal organisms and identification of recombination hotspots. A brief description will be provided below, for a more detailed methodology and explanation of algorithms refer to [47][48][49].
LastZ and MultiZ from the TBA package [50] was used to create the population genome alignment projected against the reference genome, C. purpurea strain 20.1 [18]. Alignments in MAF format were filtered using MafFilter v.1.3.1 [51] following [47]. Final alignments were merged according to the reference genome and subsequently divided into nonoverlapping windows of 100 kb. MafFilter was additionally used to compute genome-wide estimates of nucleotide diversity (Watterson's θ) and Tajima's D in 10 kb windows. Single nucleotide polymorphisms (SNPs) were called by MafFilter from the final alignment. Principal Component Analysis (PCA) and a Maximum-Likelihood phylogeny were conducted with fully resolved biallelic SNPs (Table 3) using the R package SNPRelate v1.18.1 [52] and RAxML v8.2.12 [53] using GTRGAMA and 1000 bootstrap replicates, respectively.
The following process was automated through the creation of a custom python script (https:// github.com/PlantDr430/CSU_scripts/blob/master/Fungal_recombination.py). LDhat [54] was used to estimate population recombination rates (ρ) from the filtered alignment using only fully resolved biallelic positions. A likelihood table was created for the θ value 0.01, corresponding to the genome-wide Watterson's θ of C. purpurea (Table 3; Julien Dutheil per comm), and LDhat was run with 10,000,000 iterations, sampled every 5000 iterations, with a burn-in of 100,000. The parameter ρ relates to the actual recombination rate in haploid organism through the equation ρ = 2N e × r, where N e is the effective population size and r is the per site rate of recombination. However, without knowledge of N e we cannot confidently infer r and thus sought to avoid the bias of incorrect assumptions. Therefore, we reported the population recombination rate (ρ).
Resulting recombination maps were filtered to remove pairs of SNPs for which the confidence interval of the recombination estimate was higher than two times the mean [47]. Average recombination rates were calculated in regions by weighing the average recombination estimate between every pair of SNPs by the physical distance between the SNPs. Using the reference annotation file [18], we calculated the average recombination rates for features in each gene: 1) exons, 2) introns, 3) 500 bp upstream, and 4) 500 bp downstream with a minimum of three filtered SNPs. Flanking upstream and downstream regions correspond to the 5´and 3ŕ egions for forward stranded genes and the 3´and 5´regions for reverse stranded genes. We

PLOS ONE
also calculated the average recombination rate for each intergenic region between the upstream and downstream regions of each gene. Introns were added to the GFF3 file using the GenomeTools package [55]. The original recombination maps produced from LDhat (Julien Dutheil per comm) were converted from bp to kb format for use in LDhot [48] to detect recombination hotspots with 1000 simulations and the parameter-windlist 10 was used to create 20 kb background windows [49]. Only hotspots with a value of ρ between 5 and 100 and width < 20 kb were selected for further analysis [47][48][49].

Statistical and enrichment analyses
Statistics and figures were generated using Python3 modules SciPy v1.3.1, statsmodel v0.11.0, Matplotlib v3.1.1, and seaborn v0.10.0. All multi-test corrections were performed with Benjamini-Hochberg false discovery rate procedure. Enrichment analyses were tested using Fischer's Exact test with a cutoff α = 0.05. Uncorrected p-values were corrected using Benjamini-Hochberg and Bonferroni multi-test correction with a false discovery rate (FDR) cutoff of α = 0.05. Corresponding p-values from correction tests were averaged together to get a final p-value. Enrichment was performed on protein domain names and GO terms. Orthogroups were only associated with a domain or GO term if � 50% of the strains present in the gene cluster had one gene with the term. This process was automated through creation of a custom python script (https://github.com/PlantDr430/CSU_scripts/blob/master/Domain_enrichment.py).

Pangenome analysis
We constructed a pangenome of Claviceps purpurea from 24 isolates representing a collection from three continents and six countries (Table 1). Taking advantage of plentiful isolates

PLOS ONE
available from Canada, we sampled more heavily from different provinces and on different host plants. The principal component and phylogenetic analysis revealed substantial genetic variation among the samples. However, the genetic distances were not correlated with geographic distances, such as LM470 (Canada) and Clav04 (USA) grouping closer to isolates from Europe and the isolate from New Zealand (S1 Fig). In addition, across Canada and USA, isolates from similar regions rarely clustered together and were often intermixed (S1B Fig). These results agree with the results from a multi-locus genotyping of a larger set of samples from Canada and USA [56]. Previous reports [17] showed that C. purpurea isolates had similar genome size ( Table). Within the accessory genome (including lineage-specific orthogroups) we observed 592 (5.6%) orthogroups containing paralogs, with some isolates containing > 20 genes per cluster ( Fig 1C and S1 Table). We utilized multiple gene functional categories to get a deeper understanding of how genes of different function were structured within the pangenome. As a proportion of orthogroups within each pangenome category (core, accessory, and singleton) we found that the core genome was significantly enriched in orthogroups that contained genes with conserved protein domains (conserved) (5,471; 84%), transmembrane domains (transmembrane) (1,038; 16%), peptidase and protease domains (MEROPs) (211, 3.2%), and orthogroups of carbohydrate-active enzymes (CAZys) (212, 3.2%) (P < 0.01, Fisher's exact test, Fig 2A and 2E-2G). Effector proteins play major roles in plant-microbe interactions, often conveying infection potential of the pathogen. A total of 257 predicted effector orthogroups were identified; 100 (38.9%) were core, 143 (55.6%) were accessory, and 14 (5.4%) were singletons. Predicted effectors and orthogroups coding for secreted proteins, which also contribute to host-pathogen interactions, were significantly enriched in the accessory genome (143, 5%; 218, 7.6%; respectively) (P < 0.01, Fisher's exact test, Fig 2C and 2D). Although, the accessory and singleton genomes were largely composed of unclassified orthogroups (1791, 62.8%; 830, 73.4%; respectively) (P < 0.01, Fisher's exact test, Fig 2H). Lastly, we observed that orthogroups which contained secondary (2˚) metabolite genes were similarly represented across all pangenome categories (P > 0.05, Fisher's exact test, Fig 2B).
As expected, core orthogroups were found to be significantly enriched in general housekeeping and basic cellular functions and development such as protein and ATP binding, nucleus and membrane cellular components, and transmembrane transport, metabolic, and oxidation-reduction processes (S2 Table). Protein domains in core orthogroups were significantly enriched for several WD40-repeat domains, P-loop nucleoside triphosphate hydrolase (IPR027417), armadillo-type fold (IPR016024), and a major facilitator (PF07690) (S2 Table). When narrowing the focus to orthogroups with paralogs, core paralogous orthogroups were enriched in cytochrome P450 domains, and domains associated with trehalose activity (S3 Table). In contrast, the accessory genome was only found to be enriched in a fungal acid metalloendopeptidase domain (MER0001399) and the singleton genome had enrichment for a Tc5 transposase DNA-binding domain (PF03221) (S2 Table). Accessory paralogs were found to be enriched in several protein kinases, Myb-like domains, phosphotransferases, as well as DNA integration and a MULE transposase domain (S3 Table). It should be noted that the high

Selection landscape
To further understand the evolution of genes within the pangenome we investigated the positive selection landscape on protein coding genes using 3,628 single-copy core orthologs to compute the ratio of non-synonymous substitutions to synonymous substitutions (dN/dS) ( Table 2). Ratios of dN/dS (omega, ω) can provide information of evolutionary forces shaping an organism as genes with ω > 1 may indicate positive or diversifying selection, ω = 1 may indicate neutral evolution, and ω < 1 may indicate negative or purifying selection [57].

PLOS ONE
functional categories (except 2˚metabolites which was not significantly different than conserved genes), indicating that these genes are frequently experiencing purifying selection.

Recombination landscape
Recombination is also an important potential driver of genome evolution and plays a central role in the adaptability of parasitic organisms to overcome host defenses [58]. Our genomealignments contained 154 of the original 191 scaffolds ( Table 3). The 37 missing scaffolds totaled 222,918 bp (average lengths = 6,192 ± 5,676 bp) and corresponded to 59 genes. Thirtyone of the missing scaffolds contain genes that were only part of the accessory genome of which six scaffolds contained two or more genes (S6 Table), suggesting that these scaffolds represent blocks of genetic material that could be lost or gained from isolate to isolate. Only 1 of the missing scaffolds did not contained any genes. Most of the genes found on these scaffolds encoded conserved domains associated with either reverse transcriptase, integrases, or helicases (S6 Table), which suggest unplaced repetitive content. Although, one scaffold (scaffold 185) did possess a gene encoding a conserved domain for a centromere binding protein (S6 Table). Together these observations may indicate the potential for dispensable chromosomes, as dispensable and mini-chromosomes often contain higher repetitive content [59], however, long-read sequencing is necessary for confirmation.
From our shared alignments of all 24 genomes, we recovered 1,076,901 biallelic SNPs corresponding to a median nucleotide diversity (Watterson's θ) of 0.01196 and a Tajima's D of -0.82522 calculated from 10 kb non-overlapping windows ( Table 3). The resulting SNPs were used to infer the population recombination rate (ρ) from the linkage disequilibrium between SNPs based on a priori specified population mutation rate θ, which was set to 0.01 based on our nucleotide diversity (Watterson's θ) ( Table 3) [47]. The C. purpurea genome recombination landscape was highly variable as some scaffolds showed highly heterogenous landscapes, other scaffolds showed intermixed large peaks of recombination, while others still had more constantly sized peaks across the regions (Fig 4 and S4 Fig). Overall, the mean genomic population recombination rate in C. purpurea was ρ = 0.044. Recombination in specific sequence features and gene type were examined through comparison of mean population recombination rates in exons, introns, 500-bp upstream and downstream of the coding DNA sequence, and intergenic regions based on the annotation of the reference genome (strain 20.1). The distribution of population recombination rates was comparable across different gene features and gene functional categories, although, some significant differences were observed (Fig 5). In general, we found upstream regions to have the lowest recombination rates, while downstream regions have the highest recombination rates (Fig 5). The decreased recombination in upstream regions might be the result of mechanisms trying to conserve promotor regions. This trend was observed across different functional gene categories, except in predicted effector genes where exons showed the highest recombination rates and downstream regions with the lowest, although these were not significantly different (Fig 5B). Across functional categories, secreted genes and transmembrane genes showed the highest recombination rates within each gene feature but were not always significantly different (Fig 5C).
Due to the observation of paralogs (Fig 1) and evidence of tandem gene duplication in C. purpurea [17] the extent recombination might have influenced these events was investigated. Duplicated genes were found to have lower population recombination rates than all other genes within the genome (Fig 5D), suggesting that other factors are influencing gene duplication. Due to the absence of repeat-induced point (RIP) mutation [17], transposable elements (TEs) are likely a contributing factor. To investigate the association of duplicated genes with TEs we calculated the average distance of genes to transposons (DNA and long terminal repeat

PLOS ONE
(LTR) retrotransposons) and the average number of flanking transposons. Results showed duplicated genes were significantly closer to LTRs and had significantly more flanking LTRs than predicted effectors and other genes (P < 0.0001, multi-test corrected Mann-Whitney U Test, S5 Fig). In addition, for all genes examined LTRs were significantly closer than DNA transposons (genes (P < 0.0001, multi-test corrected Mann-Whitney U Test, S5 Fig).
As distinct peaks of recombination were observed (Fig 4 and S4 Fig), LDhot was used to call statistically significant recombination hotspots by analysis of the intensity of recombination rates in 3 kb (1 kb increments) windows compared to background recombination rates in 20 kb windows [47][48][49]. After implementing a cutoff of ρ � 5 and length of 20 kb [48] only five recombination hotspots were retained, ranging from 11 kb to 18.5 kb in length (Fig 6). A recombination hotspot was identified between the lpsA1 and lpsA2 genes of the ergoline biosynthetic cluster, suggesting that this gene duplication event was likely the result of recombination (Fig 6D). Association of gene functional category and TEs within hotspots varied between regions. Some hotspots showed a greater association with duplicated genes and TEs (Fig 6B-6D), while others showed no association with duplicated genes (Fig 6E) or no association with TEs ( Fig 6A). In general, genes with conserved protein domains showed the highest presence within hotspots (S6 Fig). It should be noted that some unclassified genes and genes with conserved protein domains associated with hotspots were also found to be overlapping regions identified as repeats (Fig 6A-6C and 6E). Protein domains found within these genes were associated with ankyrin (IPR002110) and tetratricopeptide (IPR013026) repeats. Only 5 of the 846 duplicated genes [reported in 17] found throughout the reference genome were located within predicted recombination hotspots (Fig 6 and S6 Fig). While Wyka et al. [17] showed that gene cluster expansion was prevalent among predicted effectors, only one non-duplicated predicted effector (CCE30212.1) was found located within a recombination hotspot (Fig 6C). Together these results suggest that while recombination may result in important gene duplication, it is not the primary driver of gene duplication within C. purpurea.

Discussion
The establishment of a Claviceps purpurea pangenome from 24 isolates, as well as the detection of core genes with signatures of positive selection and analysis of the recombination landscape have provided knowledge into how high recombination rates and gene duplication are driving the genomic evolution and adaptation of the species.
The pangenome of C. purpurea reveals a large accessory genome with 37.78% accessory orthogroups (27.05% accessory + 10.73% singleton) in comparison to four model fungal pangenomes (Saccharomyces cerevisiae, Candida albicans, Cryptococcus neoformans, and Aspergillus fumigatus), which found around 9-19% of their genes in the accessory genome [5]. Our results are more comparable to the pangenome of the fungal pathogen Zymoseptoria tritici which had an accessory genome comprised of 40% (30% accessory + 10% singleton) of genes [6]. Similar to C. purpurea, Zymoseptoria tritici is a globally distributed fungal pathogen of wheat, suggesting that fungal species with similar geographical distributions could possess comparable pangenome structures as they are under similar evolutionary pressures. Shared ecological habitats and lifestyles have been reported to influence pangenome sizes in bacteria [4]. In fact, C. purpurea and Z. tritici both experienced enrichment of predicted effector orthogroups in the accessory genome and enrichment of carbohydrate-active enzymes (CAZys) orthogroups in the core genome (Fig 2) [6], conveying a comparable similarity between gene functions as both organisms are pathogens of wheat. In addition, Badet et al. [6] suggested that the large accessory genome of Z. tritici is likely maintained due to TE activity and a large effective population size as a result of observations of high SNP density, rapid decay in linkage disequilibrium, and high recombination rates [47,60,61]. The same mechanisms could also explain the large accessory genome observed in C. purpurea.
We identified 37 missing scaffolds in our population genome alignment with 31 of these containing genes only present in the accessory genome, suggesting the potential for blocks of DNA that could be lost/gained between isolates. Of these accessory scaffolds 15 contained genes encoding conserved domains associated with either reverse transcriptase, integrases, or

PLOS ONE
helicases and one scaffold possessed a gene encoding a conserved domain for a centromere binding protein (S6 Table). Together these could indicate the potential for dispensable minichromosomes, as dispensable and mini-chromosomes often contain higher repetitive content [59]. However, these unplaced contigs may be assembly artifacts. Due to our Illumina based assemblies we did not process these elements further but believe that these are important aspects of C. purpurea evolution and should be a focal point of future research with the advantage of long-read sequencing to understand their function more confidently. Due to these transcriptase rich unplaced scaffolds, the lack of RIP, and observation of TEs with 0% divergence [17], we believe transposons and/or transcriptases are influencing the evolution of the accessory genome in C. purpurea.
We observed an abundance of orthogroups containing paralogs (8.6%). This presence of gene duplication and association with LTR retrotransposons (S5 Fig) could be contributing to the large size of the accessory genome, potentially through pseudogenization and/or neofunctionalization. In fact, unclassified genes had the highest ω (dN/dS) ratios (Fig 3). In addition, the abundance of duplication in accessory unclassified genes [17] and their small sizes (S2 Fig)  further suggests the presence of pseudogenization and/or neofunctionalization. Badet et al. [6] suggested that TEs were likely contributing to the Z. tritici accessory genome due to the correlation of TE content with genome size and observations of transcribed TEs. We observed a similar correlation of TE content with genome size (P = 0.004, Adj. R 2 = 0.28), however, our genome sizes and TE content (30.5 Mb-32.1 Mb, 8.42% -10.87%, respectively) were not as variable as in Z. tritici, which also had a twofold higher TE content [6]. This suggests that TEs play a more important role in Z. tritici genome expansion, however, only 0.2% of the orthogroups in Z. tritici contained paralogs suggesting that gene duplication is not as common in Z. tritici as it is in C. purpurea (8.6% paralogs). The lack of gene duplication in Z. tritici is likely due to the presence of RIP [62], which should also reduce TE expansion through silencing [63][64][65]. While we lack RNAseq data to observe TE transcription within C. purpurea, observations of TEs with 0% divergence in C. purpurea [17] suggest recent TE activity. The observed reduced association of recombination with duplicated genes (Fig 6D) and association of duplicated genes with LTR transposons (S5 Fig) would suggest that gene duplication in C. purpurea is mediated in part by transposon activity.
Due to the potential for transposon mediated gene duplication, it was remarkable to find relatively low TE content (~8-10%) within C. purpurea, especially in the absence of RIP. Other genomic mechanisms, such as recombination, may limit TE expansion and increases in genome size. Tiley and Burleigh [66] found a strong negative correlation between global recombination rate, genome size and LTR retrotransposon proportion across 29 plant species, indicating that higher recombination rates actively reduce genome size likely through the removal of LTR elements. A similar function may be affecting LTR content in C. purpurea, which would explain the observed differences in LTR content between Claviceps section Claviceps (low LTR content, RIP absent) and Claviceps sections Pusillae, Paspalorum, and Citrinae (high LTR content, RIP present) [17].
On average we observed a twofold higher mean population recombination rate (ρ = 0.044) in C. purpurea than Z. tritici (ρ = 0.0217) and tenfold higher than Z. ardabiliae (ρ = 0.0045) [47]. As ρ is a function of effective population size and recombination rate per site (ρ = 2N e × r), these increases could be the result of the increment in recombination rate per site (r) and/or effective population size (N e ). Differences in ρ between the two Zymoseptoria species was postulated to be due to increased recombination rates per site as it was found that the nucleotide diversity (Watterson's θ = 2 N e x μ, where μ is mutation rate) was 1.6 times higher in Z. tritici (0.0139) than Z. ardabiliae (0.00866). Under an assumption that both Z. tritici and Z. ardabiliae have comparable mutation rates, N e of Z. tritici would only be 1.6 times higher than Z. ardabiliae, therefore, the 5-fold higher ρ would likely be caused by higher recombination rates per site [47]. Our observed Watterson's θ of 0.012 in C. purpurea (Table 2) is comparable to Z. tritici, suggesting that if mutation rates and effective populations sizes are comparable than the twofold increase in ρ is likely influenced by higher recombination rates per site in C. purpurea. Although, Z. tritici is a heterothallic organism while C. purpurea is homothallic [67] but C. purpurea does frequently out-cross in nature [19,68], suggesting that these factors may provide a difference in effective population sizes between these organisms. In addition, mutation rates might be higher in Z. tritici, than C. purpurea, due to the presence of RIP, which identifies repeat/duplicated sequences within a genome and introduces C:G to T:A mutations to effectively silence these regions [63][64][65]. It has also been reported that RIP can "leak" into neighboring non-repetitive regions and introduce mutations, thus, accelerating the rate of mutations, particularly those in closer proximity to repeat regions [69][70][71]. If the mutation rate is increased in Z. tritici due to RIP "leakage" the nucleotide diversity in Z. tritici could be the result of high mutation rates, whereas the nucleotide diversity in C. purpurea could be influenced by higher effective population size and/or recombination rates per site. Our positive selection analysis did reveal a gene with evidence of positive selection (ω = 1.52) that is related to meiotic recombination protein rec12, which is known to catalyze the formation of dsDNA breaks that initiate homologous recombination in meiosis in yeast [72]. Kan et al. [72] further determined that the frequency of dsDNA breaks catalyzed by rec12 significantly increased the frequency of intergenic recombination. In both plants [66] and Z. tritici [73], higher recombination rates were found to increase the efficacy of purifying selection. Similarly, C. purpurea had an overall trend of purifying selection with skewness towards lower ω values (Fig 3) and an observed correlation of higher population recombination rates around genes with lower ω ratios (S7 Fig), further suggesting the potential for higher recombination rates in C. purpurea.
Additional support for higher recombination rates per site in C. purpurea could be extrapolated from recombination hotspots, or lack thereof. While we observed evidence of a heterogenous recombination landscape with several scaffolds showing large peaks in population recombination rates (Fig 4 and S4 Fig), we only predicted five recombination hotspots (Fig 6), which is in stark contrast to the~1,200 hotspots identified in Z. tritici [74]. On average, we did observe higher population recombination rates across scaffolds compared to the rates observed across chromosomes of Zymoseptoria [47], suggesting that the background recombination rate in C. purpurea is higher and "flatter", potentially limiting the detection of hotspots [48]. Overall, this indicates that C. purpurea exhibits high recombination rates per site, which potentially helps defend against TE expansion.
While these higher recombination rates are likely influencing the purifying selection observed in regions of C. purpurea genome, it may not be the only reason we were unable to detect predicted core effector genes with signatures of positive selection (Fig 3). Specifically, it is possible that positive selection is occurring but the whole gene dN/dS ratio is <1. Predicted effectors may have sites under positive selection, but that are not detected by this whole gene analysis of positive selection. Therefore, our results represent a lack of power to positive selection in predicted effector genes, and not actual evidence for a lack of selection. Another potential explanation is that the ancestral state of Claviceps purpurea is plant endophytism [8] and is closely related to several mutualistic grass endophytes (i.e. Epichloe, Balansia, Atkinsonella) which have been known to provide beneficial aspects to their hosts mostly through production of secondary metabolites and plant hormones [75][76][77]. Furthermore, Wäli et al. [24] classified C. purpurea as a conditional defense mutualist with its plant host, as they found sheep avoided grazing infected grasses and observed that infection rates were higher in grazed pastures compared to ungrazed fields. Other researchers have observed neutral to positive effects of seed set, seed weight, and plant growth on infected plants compared to uninfected plants [24][25][26]78].
These factors, along with the broad host range of C. purpurea (400+ grass species) and lack of known crop resistance (R) genes, could suggest a lack of strong selection for resistance to C. purpurea in grass species [24]. This could help explain the lack of positive selection observed in predicted core effector genes, implying that effectors are not under strong selection pressure to compete in the evolutionary arms race against host defense. However, it should be noted that positive selection analyses are computed from single-copy core orthologs. Observations of significant enrichment of predicted effector genes in the accessory genome of C. purpurea and duplication of effector gene cluster [17] could implicate their role in diversity of infection potential [7], however, no host specific races of C. purpurea have been identified.
While further research is needed to better characterize the accessory genome of C. purpurea it appears that TE mediated-gene duplication and frequent recombination are likely playing a role in the expansion of C. purpurea's accessory genome and may be influencing the success of C. purpurea. In addition, all members of Claviceps section Claviceps, which contain grass pathogens that have extended geographical distributions and host ranges, have genomes that lack RIP, exhibit gene duplication, and have comparable TE content [17], suggesting that the genomic mechanisms identified in this study might be characteristic of section Claviceps as a whole.

Conclusion
Overall, we observed that~38% of the Claviceps purpurea pangenome is accessory, which is likely influenced by a large effective population size, frequent recombination, and TE mediated gene duplication. Pseudogenization and neofunctionalization might also be contributing due to the observed TE activity, observations of higher ω ratios, signatures of positive selection in core single-copy unclassified genes, and small size of many accessory unclassified genes. Due to a lack of RIP, prolific TE expansion is likely kept under controlled by high recombination rates, which subsequently may be influencing the overall trend of purifying selection.   Table. BLAST results of single-copy core orthologs with an ω (dN/dS) � 1.